| name | geometry | |
|---|---|---|
| 0 | None | POINT (381166.6 6676424.438) |
| 1 | Uimastadion | POINT (385236.565 6674238.472) |
Lviv University
Objectives
GeoDataFrames.radius query to find all neighbors within specific distance apart from a given location.Details
spatial indices to make the queries more efficient.KD-Tree (K-dimensional tree) that can provide us information about K-nearest neighbors (i.e. not only the closest).KNN search in Python
scipy libraryKD-Tree spatial index.KDTree to build the spatial index which is available from the scipy.spatial submodule.KNN search in Python
In the following, we use the building_points and stops GeoDataFrames that we already used earlier to find three closest public transport stops for each building.
Let’s start by reading the data and reproject the data into a metric coordinate reference system (EPSG:3067) so that the distances will be presented as meters:
| name | geometry | |
|---|---|---|
| 0 | None | POINT (381166.6 6676424.438) |
| 1 | Uimastadion | POINT (385236.565 6674238.472) |
KNN search in Python
KDTRee index structure based on the Point coordinates.KDTree class expects the Point coordinates to be in array format, i.e. not as shapely Point objects which we have stored in the geometry column.numpy.array format by chaining a method .get_coordinates() with the .to_numpy() method as follows:array([[ 386623.30055697, 6672037.88387716],
[ 386623.99053928, 6672239.57164472],
[ 386629.00011373, 6672130.5382358 ],
...,
[ 393706.51571504, 6707260.21305267],
[ 391372.74617002, 6697807.78260742],
[ 388733.41604041, 6714694.15542812]], shape=(8377, 2))
KNN search in Python
We can now use the .query() method which goes through all the input coordinates (i.e. buildings) and very quickly calculates which of them is the closest, 2nd closest etc.
The method returns the distances to the K-number of nearest neighbors as well as the index of the closest stop to the given building.
By passing an argument k=3, we can specify that we want to find three closest neighbors for each building:
KNN search in Python
array([[ 92.67989301, 461.43820422, 466.16915044],
[400.24336963, 409.49707253, 410.06137016],
[109.81963349, 130.59749777, 133.6424814 ],
...,
[135.34174505, 136.28586705, 274.93549394],
[ 99.40810774, 118.1492825 , 209.42172325],
[ 67.79042163, 71.91370036, 103.08138812]], shape=(158731, 3))
KNN search in Python
GeoDataFrame so that it is easier to do further analyses with the data and create some nice maps out of the data.stop_kdt.query() command comes out as an array of lists, where each item (list) contains three values that show the distances between three nearest stops and the given building.GeoDataFrame containing the building data, we need to transpose our arrays.KNN search in Python
# Make a copy
k_nearest = building_points.copy()
# Add indices of nearest stops
k_nearest["1st_nearest_idx"] = k_nearest_ix.T[0]
k_nearest["2nd_nearest_idx"] = k_nearest_ix.T[1]
k_nearest["3rd_nearest_idx"] = k_nearest_ix.T[2]
# Add distances
k_nearest["1st_nearest_dist"] = k_nearest_dist.T[0]
k_nearest["2nd_nearest_dist"] = k_nearest_dist.T[1]
k_nearest["3rd_nearest_dist"] = k_nearest_dist.T[2]| name | geometry | 1st_nearest_idx | 2nd_nearest_idx | 3rd_nearest_idx | 1st_nearest_dist | 2nd_nearest_dist | 3rd_nearest_dist | |
|---|---|---|---|---|---|---|---|---|
| 0 | None | POINT (381166.6 6676424.438) | 1131 | 1135 | 1125 | 92.679893 | 461.438204 | 466.169150 |
| 1 | Uimastadion | POINT (385236.565 6674238.472) | 467 | 465 | 475 | 400.243370 | 409.497073 | 410.061370 |
| 2 | None | POINT (386317.478 6672100.648) | 61 | 64 | 13 | 109.819633 | 130.597498 | 133.642481 |
| 3 | Hartwall Arena | POINT (385225.109 6676120.56) | 532 | 533 | 506 | 104.632434 | 137.706391 | 377.331985 |
| 4 | Talli | POINT (385079.733 6676989.745) | 496 | 497 | 498 | 472.248282 | 519.685534 | 551.358778 |
KNN search in Python
GeoDataFrames that correspond to the nearest, second nearest and third nearest stops from the buildings.stop_index as a column which allows us to easily merge the data between stops and k_nearest (buildings) GeoDataFrames..merge() function in which we use the 1st_nearest_idx, 2nd_nearest_idx and 3rd_nearest_idx as the key on the left GeoDataFrame, while the stop_index is the key on the right GeoDataFrame.suffixes=('', '_knearest) argument to the .merge() method| name | geometry | 1st_nearest_idx | 2nd_nearest_idx | 3rd_nearest_idx | 1st_nearest_dist | 2nd_nearest_dist | 3rd_nearest_dist | stop_index | geometry_knearest | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | None | POINT (381166.6 6676424.438) | 1131 | 1135 | 1125 | 92.679893 | 461.438204 | 466.16915 | 1131 | POINT (381256.66 6676446.317) |
| 1 | Uimastadion | POINT (385236.565 6674238.472) | 467 | 465 | 475 | 400.243370 | 409.497073 | 410.06137 | 467 | POINT (384973.331 6674539.973) |
KNN search in Python
| name | geometry | 1st_nearest_idx | 2nd_nearest_idx | 3rd_nearest_idx | 1st_nearest_dist | 2nd_nearest_dist | 3rd_nearest_dist | stop_index | geometry_knearest | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | None | POINT (381166.6 6676424.438) | 1131 | 1135 | 1125 | 92.679893 | 461.438204 | 466.16915 | 1135 | POINT (381625.316 6676474.488) |
| 1 | Uimastadion | POINT (385236.565 6674238.472) | 467 | 465 | 475 | 400.243370 | 409.497073 | 410.06137 | 465 | POINT (385270.775 6674646.538) |
KNN search in Python
| name | geometry | 1st_nearest_idx | 2nd_nearest_idx | 3rd_nearest_idx | 1st_nearest_dist | 2nd_nearest_dist | 3rd_nearest_dist | stop_index | geometry_knearest | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | None | POINT (381166.6 6676424.438) | 1131 | 1135 | 1125 | 92.679893 | 461.438204 | 466.16915 | 1125 | POINT (381632.74 6676429.668) |
| 1 | Uimastadion | POINT (385236.565 6674238.472) | 467 | 465 | 475 | 400.243370 | 409.497073 | 410.06137 | 475 | POINT (385122.17 6674632.254) |
KNN search in Python
Now we can create LineString geometries connecting these Point objects to each other:
from shapely import LineString
# Generate LineStrings connecting the building point and K-nearest neighbor
k_nearest_1["geometry"] = k_nearest_1.apply(
lambda row: LineString([row["geometry"], row["geometry_knearest"]]), axis=1
)
k_nearest_2["geometry"] = k_nearest_2.apply(
lambda row: LineString([row["geometry"], row["geometry_knearest"]]), axis=1
)
k_nearest_3["geometry"] = k_nearest_3.apply(
lambda row: LineString([row["geometry"], row["geometry_knearest"]]), axis=1
)
k_nearest_1.head(2)| name | geometry | 1st_nearest_idx | 2nd_nearest_idx | 3rd_nearest_idx | 1st_nearest_dist | 2nd_nearest_dist | 3rd_nearest_dist | stop_index | geometry_knearest | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | None | LINESTRING (381166.6 6676424.438, 381256.66 66... | 1131 | 1135 | 1125 | 92.679893 | 461.438204 | 466.16915 | 1131 | POINT (381256.66 6676446.317) |
| 1 | Uimastadion | LINESTRING (385236.565 6674238.472, 384973.331... | 467 | 465 | 475 | 400.243370 | 409.497073 | 410.06137 | 467 | POINT (384973.331 6674539.973) |
KNN search in Python
In the following, we select a specific building and the closest stops from that building. The name column contains information about the names of the buildings which we can use to choose a building of our interest for visualization:
KNN search in Python
Thus, let’s filter the data for Hartwall Arena and create an interactive map out of the results, showing the three closest stops indicated with different colors:
# Visualize 3 nearest stops to
selected_name = "Hartwall Arena"
m = k_nearest_1.loc[k_nearest_1["name"] == selected_name].explore(
color="red", tiles="CartoDB Positron", max_zoom=16
)
m = k_nearest_2.loc[k_nearest_2["name"] == selected_name].explore(m=m, color="orange")
m = k_nearest_3.loc[k_nearest_3["name"] == selected_name].explore(m=m, color="blue")
m = stops.explore(m=m, color="green")
mNearest neighbors within radius
KDTree index for both datasets (buildings and stops) and then use a .query_ball_tree() method to find all neighbors within the radius r:8377
Nearest neighbors within radius
Now we can easily store the building indices as a new column to the stops GeoDataFrame:
| stop_name | stop_lat | stop_lon | stop_id | geometry | stop_index | building_ids_within_range | |
|---|---|---|---|---|---|---|---|
| 0 | Ritarihuone | 60.169460 | 24.956670 | 1010102 | POINT (386623.301 6672037.884) | 0 | [1129, 1130, 1155, 2054, 2055, 2056, 2057, 205... |
| 1 | Kirkkokatu | 60.171270 | 24.956570 | 1010103 | POINT (386623.991 6672239.572) | 1 | [1130, 2054, 2055, 2057, 2058, 2059, 2066, 206... |
| 2 | Kirkkokatu | 60.170293 | 24.956721 | 1010104 | POINT (386629 6672130.538) | 2 | [1129, 1130, 2054, 2055, 2056, 2057, 2058, 205... |
| 3 | Vironkatu | 60.172580 | 24.956554 | 1010105 | POINT (386627.617 6672385.448) | 3 | [2060, 2062, 2063, 2064, 2065, 2066, 2067, 206... |
| 4 | Vironkatu | 60.172990 | 24.956380 | 1010106 | POINT (386619.379 6672431.394) | 4 | [1136, 2060, 2061, 2062, 2063, 2064, 2065, 206... |
Nearest neighbors within radius
With this information, we can for example calculate the number of buildings within 200 meters from each stop. To do this, we can create a simple lambda function that checks the length of the id-list and store the result into column building_cnt:
| stop_name | stop_lat | stop_lon | stop_id | geometry | stop_index | building_ids_within_range | building_cnt | |
|---|---|---|---|---|---|---|---|---|
| 0 | Ritarihuone | 60.169460 | 24.956670 | 1010102 | POINT (386623.301 6672037.884) | 0 | [1129, 1130, 1155, 2054, 2055, 2056, 2057, 205... | 50 |
| 1 | Kirkkokatu | 60.171270 | 24.956570 | 1010103 | POINT (386623.991 6672239.572) | 1 | [1130, 2054, 2055, 2057, 2058, 2059, 2066, 206... | 69 |
| 2 | Kirkkokatu | 60.170293 | 24.956721 | 1010104 | POINT (386629 6672130.538) | 2 | [1129, 1130, 2054, 2055, 2056, 2057, 2058, 205... | 56 |
| 3 | Vironkatu | 60.172580 | 24.956554 | 1010105 | POINT (386627.617 6672385.448) | 3 | [2060, 2062, 2063, 2064, 2065, 2066, 2067, 206... | 74 |
| 4 | Vironkatu | 60.172990 | 24.956380 | 1010106 | POINT (386619.379 6672431.394) | 4 | [1136, 2060, 2061, 2062, 2063, 2064, 2065, 206... | 78 |
Nearest neighbors within radius
By calculating simple statistics from the building_cnt column, we can see that on average there are 32.2 buildings within 200 meters from the public transport stops and the maximum number of buildings within this distance is whopping 181 buildings.
Exercise
There is also an alternative approach for making a radius query by calculating a buffer around the stop points and then making a spatial join between these Polygon geometries and the buildings. This approach also allows to make queries between other type of geometries than Points.
stop_id.